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Abstract —Characterizing the phase transitions of convex op¬ 
timizations in recovering structured signals or data is of central 
importance in compressed sensing, machine learning and statis¬ 
tics. The phase transitions of many convex optimization signal 
recovery methods such as minimization and nuclear norm 
minimization are well understood through recent years’ research. 
However, rigorously characterizing the phase transition of total 
variation (TV) minimization in recovering sparse-gradlent signal 
is still open. In this paper, we fully characterize the phase 
transition curve of the TV minimization. Our proof hullds on 
Donoho, Johnstone and Montanan’s conjectured phase transition 
curve for the TV approximate message passing algorithm (AMP), 
together with the linkage between the minmax Mean Square 
Error of a denoising problem and the high-dimensional convex 
geometry for TV minimization. 

I. Introduction 

In the last decade, using convex optimization to recover 
parsimoniously-modeled signal or data from a limited num¬ 
ber of samples has attracted significant research interests in 
compressed sensing, machine learning and statistics |fTl-||4l. 
Eor example, in compressed sensing, the main idea is to 
exploit the sparse structures inherent to the underlying signal, 
and design sparsity-promoting convex optimization programs, 
such as minimization, to efficiently recover the signal from 
a much smaller number of measurements than the ambient 
signal dimension. Numerical results empirically show that 
these convex optimization based signal recovery algorithms 
often exhibit a phase transition phenomenon: when the number 
of measurements exceeds a certain threshold, the convex opti¬ 
mization can correctly recover the structured signals with high 
probability; when the number of measurements is smaller than 
the threshold, the convex optimization will fail to recover the 
underlying structured signals with high probability. A series of 
works studying convex geometry for linear inverse problems 
have made substantial progress in theoretically characterizing 
the phase transition phenomenon for convex optimizations in 
recovering structured signals JJ), lH-llSl. Eor example, the 
phase transitions for minimization used in recovering sparse 
signals and nuclear norm minimization used in recovering low- 
rank matrix have been well understood 121, ISl-lSl. 

In spite of all this progress, characterizing the phase tran¬ 
sition for the total variation minimization used in recovering 
sparse-gradient signals is still open. Sparse-gradient signals 


are signals that are piece-wise constant, and thus have a small 
number of non-zero gradients. This type of signals arise nat¬ 
urally in applications in signal denoising and in digital image 
processing ©-mil. Let X* € K." be a vector representing a 
one-dimensional piece-wise constant signal, and Bx* denote 
the finite difference of x*, in which (Bx*)^ = x*_|_j^ — x* with 
X* being the ith element of x*. Since x* has sparse gradients, 
Bx* has very few non-zero entries. Suppose one observes 
y = Ax*, in which A S observation matrix, 

then in the total variation (TV) minimization problems, one 
tries to recover x* from y by solving 

min l|Bx||i, (1) 

X 

s.t. y = Ax. 

n—1 

Here, ||Bx||i = ^ (Bx)i is called the total variation semi- 

i=l 

norm of x. 

TV minimization has a wide range of applications, includ¬ 
ing image reconstruction and restoration ma, ini, medi¬ 
cal imaging m, noise removing ma, computing surface 
evolution ma and profile reconstruction mu. However, the 
understanding of the performance of TV minimization is less 
complete than that of other convex optimization based methods 
such as £i minimization. In particular, the phase transition 
of the TV minimization has not been fully characterized and 
remains as an open problem. In this paper, we solve this open 
problem of fully characterizing the phase transition of the TV 
regularization. The starting points of our investigation are the 
results obtained in Q and which we discuss in detail in 
the following. 

Eirst, for a general signal recovery problem using general 
proper convex penalty function /(x) given as follows, 

min /(x), (2) 

X 

s.t. y = Ax, 

the authors of m showed that the phase transition on the 
number of measurements happens at the Gaussian width of 
the descent cone of the proper convex penalty function /(x). 
Using this result and earlier results from polyhedral geom¬ 
etry, researchers have fully characterized the phase transition 
thresholds for £i minimization and nuclear norm minimization 
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by calculating the Gaussian width of their decent cones. How¬ 
ever, since the total variation semi-norm is a non-separable 
convex penalty term, calculating the precise Gaussian width 
of the descent cone of the total variation semi-norm is difficult 
and remains open. This difficulty in calculating the Gaussian 
width also prevents us from characterizing the phase transition 
of total variation minimization in recovering sparse-gradient 
signals. 

Second, in a, the authors first considered a denoising prob¬ 
lem where the total variation regularizer was used to denoise 
sparse-gradient signals contaminated by additive Gaussian 
noises, and characterized the minimax MSE of this denoising 
method. The authors in a further proposed an approximate 
message passing algorithm for recovering sparse-gradient mea¬ 
surements from undersampled measurements, and conjectured 
that the minimax MSE for the denoising problem was the 
same as the phase transition (the number of measurements) 
for the approximate message passing algorithm. Numerical 
results in demonstrated that the empirical phase transi¬ 
tions for both the AMP algorithm and the total variation 
minimization ([T]i match the minimax MSE for the denoising 
problem. However, justifying the conjecture in requires 
the assumption that the state evolution for the approximate 
message passing algorithm is valid, which still remains to be 
proved. Eurthermore, we do not know whether the AMP and 
the total variation minimization indeed have the same phase 
transition. In lUTl . the authors showed that the minimax MSE 
of the denoising problem considered in is an upper bound 
on the phase transition (the number of needed measurements) 
of total variation minimization (as will be discussed later 
in this paper). However, it remains unknown whether the 
minimax MSE of the denoising problem is indeed the phase 
transition of total variation minimization. 

As our main contribution in this paper, we rigorously prove 
that the minimax MSE of TV-regularized denoising considered 
by m is indeed the phase transition of the TV minimization 
problem ([T]i, by showing the minimax MSE of the denoising 
problem is approximately equal to the Gaussian width of the 
descent cone of the TV semi-norm, up to negligible constants. 
We remark that, different from the Gaussian width, the min¬ 
imax MSE of the TV-regularized denoising can be readily 
computed. We can thus characterize the phase transition of 
total variation minimization using the minimax MSE of the 
denoising problem. 

Here, we would like to compare our work with US). In 
im, the authors gave upper and lower bounds on the number 
of needed measurements for recovering worst-case sparse- 
gradient signals which have a fixed number of nonzero ele¬ 
ments in its signal gradient, using the tool of Gaussian width. 
In contrast, in this paper we will focus on the phase transition 
for average-case sparse-gradient signals, where the number of 
nonzero elements in signal gradient grows proportionally with 
the ambient signal dimension. 

The remainder of the paper is organized as follows. In 
Section m we introduce the background and set up the 
notations that will be used in later analysis and proofs. In 


Section Uni we verify that the TV regularizer satisfies the weak 
decomposability condition in mi and use this condition to 
fully characterize the phase transition of the TV minimiza¬ 
tion problem. In Section IIVI we provide several concluding 
remarks. 


II. Background 
A. Definitions and Notations 

We first introduce definitions and notations that will be used 
throughout the paper. 

We use /(x) to denote the TV regularizer /(x) := ||Bx||i, 
which is not a norm, and B G with 

{ 1 if j = 2 

-1 if j = 1 - 1-1 (3) 

0 otherwise. 

Let df{x) be the subdifferential of / at x. 

Eor a given non-empty set C C R", the cone obtained by C 
is defined as 

cone(C) := {Ax G R" : x G C, A > 0}. (4) 

The distance from a vector g G R" to the set C is defined 
as 


dist(g,C) := inf ||g-M|| 2 , 

u£C 

in which || • ||2 is the £2 norm. 

The mean square distance to C is defined as 

D{C) := E{dist(g,C)2}, 


(5) 


( 6 ) 


in which the expectation is taken over g ~ Af{0, 1) with I 
being the identity matrix. 

Throughout the paper, we will use [fc] := {1,2^ - ■ ■ ,k} 
where /c is a positive integer, [b,e] := {b,b + 1, ■ ■ ■ , e} where 
e> b. Similarly, (6, e) := {& -I- 1, 6 -I- 2, • • • , e — 1}. Let S be 
a subset of [n — 1], then 5'^ denote the complement of S with 
respect to [n — 1]. We will use |<S| to denote the cardinality of 
the set S. 

Let u G R"”^ be a vector and 5 be a subset of the indices 
set [n — 1], then us G R"“^ is the vector such that 


{us)i = 



if i G S 
if i ^ S. 


(7) 


We use Us G RI"^! to denote the shortened version of us 
by deleting all zeros in us- To be more explicit, let S = 

{Sl7 S2, • • • , S|5|}7 


{us)i = Usi, ffi G [|<S|]. (8) 

Let M G be a matrix, and S and Tbe subsets 

of [n — 1], then M^^t- G is the matrix produced by 

deleting all rows not in S and columns not in T from M. To be 
explicit, let 5 = {si,S 2 ,--- ,S| 5 |}andT = ,t\T\}’ 


{Ms,rkj = Vz G [|5|] and Vj G [\T\]. (9) 

We also write M 5 7 - as M 5 o if T = [n — 1], Similarly, if 
iS = [n — 1 ], we write as Mn.r- 
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B. Phase Transition for the AMP ^ 

Ini), to recover sparse-gradient signals from undersampled 
measurements, the authors proposed an iterative approximate 
message passing algorithm, called TV-AMP algorithm, which 
uses the TV denoisers in each iteration. The authors further 
connected the TV-AMP algorithm with the minimax denoising 
problem. In the denoising problem, one observes y = x* -f z, 
in which z is the noise vector with i.i.d. standard Gaussian 
random variables with unit variance, and tries to recover x* 
from the noisy observation y. In particular, ii conjectured 
that the minimax MSB of the denoising problem will correctly 
predict the phase transition of the TV-AMP algorithm. More¬ 
over, it is observed that the minimax MSB of the denoising 
problem matches the empirical phase transition of ([U, and the 
empirical phase transition of the AMP algorithm. Bet uimap 
be the number of observation needed for the AMP algorithm, 
a numerically showed that, as soon as rriMAP > 
the AMP algorithm will be successful in recovering x* with a 
high probability. Here Mjenoiser is the per-coordinate minimax 
mean squared error of the denoising problem when one 
observes y = x* -f z and uses the TV-penalized least-square 
denoisers. However, in a, the analytically derived phase 
transition for the AMP algorithms depends on the assumption 
of the AMP state evolution being correct. However, proving 
that the assumption holds true remains open for the TV- 
AMP. Moreover, it is unknown whether the phase transition of 
the AMP algorithm theoretically matches the phase transition 
of the TV minimization O. Thus characterizing the phase 
transition for the TV minimization remains open, even though 
we have a phase transition formula from H matching the 
empirical performance of TV minimization . 

In another line of work using convex geometry, lEl 
showed that the minimax MSB Mdenoiser is closely related 
to minA>o D{Xdf{x.)), where 9/(x) is the subdifferential of 
/(x) at the underlying signal x. In particular, ifTTl showed that 
?T-.Mdenoiser ~ min Z?(A9/(x)). However, it is still unknown 

whether min>,>o D{Xdf{x.)) provides the phase transition for 
the AMP or the TV minimization O- 

C. Phase Transition Based on Gaussian Width Calculation Ml? 

Using the “escape through the mesh” lemma, recent works 
0-111, Eol have shown that, for a proper convex function 
/(•), I?(cone(9/(xo))) (where Xq is the original signal) is the 
phase transition threshold on the number of needed Gaussian 
measurements for the optimization problem © to recover 
Xq. As discussed above, while this formula I?(cone(c)/(xo))) 
is applicable for the TV minimization problem, it is not clear 
how to compute it for the TV semi-norm function /(x), which 
is a non-separable function. This is in contrast to the Gaussian 
width calcaulations for separable penalty functions such as £i 
norms. 

D. Central Issue and Our Approach 

At this point, it is not known whether 
minA>o B)(A9/(x)) k, D{cone{df{x.))) or not for the 


TV regularizer /(x). Thus it is not clear whether the minmax 
MSB result derived in 01 will directly give the phase transition 
of the TV minimization. In fact, when /(x) represents a norm 
of X, it is known that minA>o D{Xdf{x)) D{cone{df{x.))) 
O. One may thus wonder whether we can show this equality 
to hold for the TV regularizer by directly applying (3.5) 
in ini or (4.3) in Q. However, there are two obstacles 
for directly applying those two equations. Birst, the TV 
regularizer /(x) is not a norm but a semi-norm instead. 
Secondly, even if we go ahead with applying (3.5) in ini or 
(4.3) in Q to bound the Gaussian width of the descent cone 
of the function /(x), the approximation error is too big, since 
l//(x/||x|| 2 ) can be arbitrarily big for an n-dimensional 
signal X, when /(x) is the total variation semi-norm. 

In this paper, we will show that 

mniA(A9/(x)) Ri iA(cone(c)/(x))), (10) 

for the TV regularizer, and min 7A(A9/(x)) is indeed the 

phase transition of the TV regularizer. 

In order to show Coll, we instead build on Proposition 1 
of Cl. In particular, we show that /(x) satisfies the weak 
decomposability condition defined in ifTOl . and hence we can 
use Proposition 1 of d to obtain: 

mniA(Ac)/(x)) < iA(cone(9/(x))) -f 6, (11) 

which coupled with the fact that 

jmnD{Xdf{x.)) > iA(cone(c)/(x))) 

proves Coll- 

III. Main Result 

In this section, we prove that miniA(Ac)/(x)) is the phase 

transition of O by showing that (fTOl i holds. Bor any given 
nonzero vector x G R", define v G with 

{ 1 if Xi+i < Xi 

-1 if Xi+i > Xi (12) 

e [-1,1] if Xi+i = Xi. 

Bet V denote the set of v’s that satisfy (fl© . then df{x) can 
be written as 

df{x) = {'B^v : V G V}. (13) 

Definition 1. For x ^ 0, the set df{x) is said to satisfy the 
weak decomposability assumption if there exists Wq G df{x) 
such that 

{w-wo,wo) = 0, (14) 

simultaneously for all w G df(x). 

Using CD, we can rewrite (fT4l i as 

3no e V s.t. (B^n — B^no, B^vg) = 0 , Vn e V. (15) 

We have the following result regarding the weak decompos¬ 
ability of df{x). 
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Lemma 1. For any given nonzero x £ R.”, 9/(x) satisfies 
the weak decomposability assumption. 

Proof: To check the decomposability assumption, we 
need to check whether we can always hnd a Dq S V that 
satishes ([I5]l. 

It is easy to check that BB^ is symmetric, and hence (flsT l 
is equivalent to 

3rto € V s.t. Dq BB^v = Vt;eV.(16) 

CDi indicates that (fTSl l is satished if and only if we can hnd 
a Do G V such that dJ^BB^d is a constant for all v £V. 

Dehne the set of indices 5 := {i G [n — 1] : x^ = Xi+i}. 
If 5 = 0, (fThl) holds trivially, as in this case ||Bx||i is 
differentiable and V is a singular set. In the following we 
focus on the case that 5^0. 

When <S 7 ^ 0, 5 can be written as a union of consecutive 
groups of indices that S — e^], where iT + 1 is the 

number of intervals in which the elements in x have the same 
value, bi < et, \/i € [K + 1] and - e, > 1, Vi G [K]. 
S can also be expressed explicitly as 5 = { 51 , 52 , • • • ,‘5|5|} 
with elements increasing. We can dehne 5^^ and 5“^ that have 
increasing elements in a similar manner. 

Using the notation introduced in (|7]i, we can write v = 
Vs + vs'=, and hence 

v^BB^v = v'^BB'^vs + v^BB'^vsc. (17) 
Notice that 

{ 0 , if Xj+i = Xi, 

1 , if Xj+i < Xi, (18) 

- 1 , if Xj+i > Xi, 

where i G [n—1]. Given x, is hxed and hence dJ^BB^d^c 
is hxed. Since {vs)i can be any real number in [—1,1] for 
i G 5, a necessary and sufficient condition for the right hand 
side of (fTTl i to be a constant is 

<BB^D5 = 0 , 

which can be seen by setting vs = 0. Using notations 
introduced in (Ell and dUl, the equation above can be written 
as 



Dj{(BB^)n.5'Ws = 

0, Vd G V 


D[{(BB^)n,5 = 0, 



(BB^ 

[s.nfo = 0, 



(BB^ 

)s,nivo)s = 

-(BB^)5_n(i’o)5‘= 


(BB^ 

)s,s{vo)s = 

— (BB'^)5^o(do)sc 


If {BB'^)s,s is invertible, from (fT9T l. we obtain 

{vo)s = -{{BB^)s,s)-\BB^)sMvo)s^ 

= -((BB^)5,5)"'(BB^)s,5c(do)5c. (20) 

Hence, if the answers to the following two questions are both 
yes: 

1) Is (BB^)^ 5 invertible? 


2) Is {vo)s produced by (l20l l feasible? Or equivalently, 
does each element of (do)s fall into the interval [— 1 , 1 ]? 

then, combining (l 20 l i with (do)sc in (fT^ . we hnd a feasible 
Do that satishes the weak decomposability assumption. 

To answer the hrst question, we need to study the struc¬ 
ture of (BB^) 5 _ 5 . Note that (BB^)“^ is symmetric as 
BB^ shown in ( 1 ^ is symmetric, and that (BB^) ^ G 
]^(n-i)x(r!,-i) p)ehne [•] and [•] to be the hoor and ceil¬ 
ing operator respectively. Here we give the exact form for 
BB^ and (BB^)“^. For (BB^)“^ we only give the upper 
triangular in (l 22 ll due to its symmetry. 


(BB^),,, = 1 

r'' 

if * = J, 

if 1 * -jI = 1 , 

otherwise. 

(24) 

((BB^)-'),2 = j 

f *("-?) 

1 n 

1 

[ n 

, if * < 7, 

, if * > j- 

(25) 

From (l2Tll. we have 




iBB^)s,s.^.J 



p^l 




2 , if * = j, 

if I* - j| = 1 , (26) 

0 , otherwise, 


in which is the indicator function. 

Recall that 5 = [hi, e^j, where bi < et, Vi G [iV -f 1] 
and bi+i — Ci > 1, Vi G [K], Let f := Bi — bi + 1 denote 
the length of ith group. Hence |5| = ^ positive 

integer I, dehne matrix H(i) G 

f 2 , if ^ = j, 

= <^-1, if|z-j| = l, (27) 

[O, otherwise. 

So (BB^) 5_5 can be expressed as 

(BB^)5.s 

/ H(/i) \ 

H(/ 2 ) 


V BilK+l) 

and 

((BB^)5,5)-' 

/ \ 

H(/2)-1 


\ 


( 28 ) 
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/ 2 -1 

-1 2 -1 

-1 2 


BB^ = 


V 

/ (n - 1) (n - 2) 
2(n- 2) 


(BB 


T\-l 


V 

/ 2 

-^S 2 - 5 i = 1 


(BB^)5,5 = 


-1 

-1 2 -1 
-1 2 / 


rti 

2 rii 

Lijrii 


-<5s2-Si=i 

2 -<5s3-52 = 1 

-<^53 -52 = 1 2 


2 

2 x 2 


1 \ 
2 

LfJ 


2(n-2) (n-2) 

(n- 1 ) y 


— 1 

-<55k+i-Sk = 1 2 


( 21 ) 


( 22 ) 


(23) 


where 






if * < j, 

if i > j. 


(29) 


(l28l l implies that the answer to the hrst question is yes. Now, 
we investigate the second question. For that, we hrst study the 
structure of (BB^) 5 , 5 c. Note that 5 fl 5° = 0, 


((BB^)5,sO 




- 1 , if |5,-(5^), 1 = 1 , 
0 , otherwise. 


(30) 


Notice that ((BB^)^ 5 ) ^ is in block form, we can also di¬ 
vide (BB ^)5 5 c into blocks corresponding to ((BB^)^ ^)"^. 


(BB^)s,5^ 




where (BB’^)^^^,, S l^hx|5“| (jgjjote the ith block. 
Now we have 


(31) 


((BB^)5,s)"'(BB^)5,s» 


^ H(/i)-i(BB^)y;j. 

^ H(/;,+i)-i(BB^)g-r) 


(32) 

Next, we conduct a more close analysis of (BB^)^^^, Vi € 
\K + 1], Note that the interval with length li corresponds to 
indices [bi,ei] of x, due to condition in (|29]) . —1 can only 
appear at position (j, 1) when Sj = bi and = 6 ^ ± 1 , or 
when Sj = and Sf = ± 1. Now, consider two cases: 

Case 1: If bi = e^, then 6 ^ -f 1 = -I- 1 and — 1 = — 1. 

So —1 can only appear at most two positions, the resulting row 


vector H(/,)-i(BB^)y‘]„ has at most two nonzero elements 
which are equal to —2 due to = 2 . 

Case 2: If bi ^ Ci, then -t- 1 G [ 61 , 6 ,] ^ S^ and 
Bi — 1 € ^ S^. So —1 can only appear at most two 

positions, which we know must lie in the hrst row and last row 
of (BB^)^^^^ respectively, since the points in {bi, Bi) have no 
points in S^. The hrst element and last element in each row, 
say I, of are and — j- from (l30l l. Hence each 

row I in the result matrix H(/i)“^(BB^)^yc has at most 
two nonzero elements which are —2^ and --p. Note that 

2^-f ^ = 1 and-1 < 2^, 2- < 1 / 

Combining these two cases, we know that each row in 
((BB ^)5 5 )“^(BB ^)5 5 c has at most two nonzero elements 
which falls between [—1,1] and whose sum is —I. Since each 
element in (i3o)5'= falls into [— 1 , 1 ], the resulting (iio )5 is 
always feasible. This implies that the answer to the second 
question is also yes. 

As the result, we hnd a Vq, by combining (fTsT i and (l20l l. that 
satishes the weak decomposability. The proof of the lemma is 
complete. ■ 

With Lemma [T] we are ready to state the main result. 

Theorem 1. The phase transition of the TV minimization 
problem is min>>o i9(A()/(x)). 

Proof: We will use Proposition 1 in m, which also 
applies to any other convex complexity measure. As Lemma 
1 shows that df{x) satishes the weak decomposability, using 
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Proposition 1 in ifT^ . we have 


min Z?(A9/(x)) 


minE < inf llg — u 

A>0 


< E < min inf llg —itllnl+G 
[X>0 ueXdfix.) J 

= L>(cone(5/(x)))+6. (33) 


We also have 


5.kD(Aa/(x)) = 


> 


E < min inf 

A>0 uGXdf{x) 


l|g 


D{cone{df{x))). 



Combining (|3^ and (l34l l. we have 


Z?(cone(9/(x))) < minZ?(A9/(x)) < Z)(cone(c)/(x))) +6. 

(35) 

Since minA>o D{Xdf{x.)) grows proportionally with n when 
the sparsity of the gradient grow proportionally with n, as 
shown in a, the approximation error 6 is negeligible. Thus 
we complete our proof. ■ 


IV. Conclusion 

We have verified that the TV regularizer satisfies 
the weak decomposability condition. We have proved 
mm\>Q D{Xdf{x.)) ~ D(cone(9/(x))) for the TV regular¬ 
izer /(x). Thus the minmax MSE result derived in Donoho’s 
paper a directly gives the phase transition of the total 
variation minimization. 
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